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ABSTRACT 

Conventional statistical significance tests do not 
inform the researcher regarding the likelihood that results will 
replicate. One strategy for evaluating result replication is to use a 
"bootstrap" resampling of a study's data so that the stability of 
results across numerous configurations of the subjects can be 
explored. This paper illustrates the use of the bootstrap in a 
canonical correlation analysis. Canonical correlation analysis is the 
most general case of classical general linear model analyses, 
subsuming other univariate and multivariate parametric method (e.g., 
t-tests, analysis of variance, analysis of covariance, regression, 
multivariate analysis of variance, and discriminant analysis) as 
special cases. A sample of 50 out of 301 subjects from a study by K. 
J. Holzinger and F. Swineford (1939) is used. Since bootstrap 
analyses capitalize during resampling on the commonalities inherent 
in a given sample, they yield somewhat inflated evaluations of 
repiicability. However, inflated empirical evaluations of 
repiicability are often superior to a mere presumption of 
repiicability. Ten tables and one figure present details of the 
analysis. A 63-item list of references and an appendix listing the 50 
analysis cases are included. (Author/SLD) 
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ABSTRACT 

Convcintic nal statistical significance tests do not inform the 
researcher regarding the likelihood that results will replicate. 
One strategy for evaluating result replicability i.> to employ a 
"bootstrap" resampling of a study's data so that the stability of 
results across numerous configurations of the subjects can be 
explored. The present paper illustrates the use of the bootstrap 
in a canonical correlation analysis. Canonical correlation 
analysis is the most general case of classical general linear model 
analyses, subsuming other univariate and multivariate parametric 
methods (e.g., t-tests, Al'JOVA, ANCOVA, £, regression, MANOVA, and 
discriminant analysis) as special cases. 
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The use of statistical significance testing as part of the 
interpretation of empirical research results has historically 
generated considerable debate (Carver, 1978; Huberty, 1987; 
Morrison t Henkel, 1970; Thompson, I989a, 1989c, I989e) . A series 
Of articles on the limits of statistical significance testing has 
even appeared on a seemingly periodic basis in recent editions of 
the Awgr4gan Pf.Yrho1or,i 9t (Cohen, 1990; Kupfersmid, 1988; Rosnow s 
Rosenthal, 1989) . Thompson (1992c) points out several of the many 
possible objections to overreliance on conventional statistical 
significance testing. Two of these objections are most noteworthy. 
^' gtStiStigfl l Siqnifiranre Testinr, ^.^ b e Tm■^■o1^^j ^., 

Even some widely respected authors of prominent methodology 
textbooks at times take internally inconsistent positions with 
respect to the role that conventional statistical significance 
testing should play in analysis (see book reviews by Thompson, 
1987a, I988d). And some dissertation authors may be 
disproportionately susceptible to excessive awe for significance 
tests (Lacaccia, 199:1; Thompson, 1988b). But researchers who have 
had the experience of working with large samples (cf . Kaiser, 1976) 
soon realize that virtually all null hypotheses will be rejected at 
some sample size, since "the null hypothesis of no difference is 
almost never sjastly true in the population" (Thompson, 1987b, p. 
14). AS Meehl (1978, p. 822) notes, "As I believe is generally 
recognized by statisticians today and by thoughtful social 
scientists, the null hypothesis, taken literally, is always false." 
Thus Hays (1981, p. 293) argues that "virtually any study can be 
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made to show significant results if one uses enough subjects. h M«ny 
researchers possess this insight, but somehow do not integrate this 
knowledge into their paradigms for conceptualizing or conductinc, 
research. Thus, the insight too rarely impacts actual practice. 

Although statistical significance is a function of at least 
seven interrelated features of a study (Schneider & Darcy, 1984), 
sample size is a basic influence on significance. To some extent 
significance tests evaluate the size of the researcher's sample- 
most researchers already know prior to conducting significance 
tests whether the sample in hand is large or small, so these 
outcomes do not always yield understanding that would be lost 
absent a significance test. As Thompson (1992b, p. 436) notes: 
Statistical significance testing can inv Uve a 
tautological logic in which tired researchers, 
having collected data from hundreds of subjects, 
then conduct a statistical test to evaluate whether 
there were a lot of subjects, which the researchers 
already know, because they collec\:ed the data and 
know they're tired. This tautology has created 
considerable damage as regards the cumulation of 
knowledge . . . 

^ * Sole Re lj anoe on statistical Sianifln^nce Testing r^o ;.<-o« 
Ineg gap^b le m l emmas for ReaearchPi^c ^ i^ ^^Anq creates 

Researchers who place an inordinate emphasis on statistical 
significance tests also often confront an inescapable dilemma, 
though most researchers do not recognize (or prefer to ignore) this 
dilemma. All statistical significance tests invoke certain 
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assumptions. For example, ANOVA requires pooling the variances of 
the dependent variable across the cells of the design during the 
calculation of the mean square used in the denominator of the 
fixed-effects I-test (Haase & Thompson, 1992). This pooling is 
legitimate if and only if the variances of the dependent variable 
scores in all the cells are essentially equal. This is the well 
known "homogeneity (i.e., equality) of variance" assumption. 

Similarly, as Thompson (1992a) notes, ANCOVA is a three-stage 
analysis in which (a) regression weights for the covariate are 
derived completely ignoring group or cell membership of the 
subjects, (b) predicted dependent variable scores (?) are computed 
using the weights, and are then subtracted from the actual 
dependent variable scores (Y) of the subjects to yield an "e" score 
(»ev = y. - Y.) for each ith subject, and then (c) an ANOVA is 
conducted using the "e" scores as the dependent variable in place 
of the Y scores. As Loftin and Madison (1991) explain in some 
detail, this process is legiti-^ate if and only if the regression 
equations for predicting Y with the covariate (s) are essentially 
the same, i.e., the "homogeneity of regression" assumption is met. 
Because a single regression equation, a single equation that is 
calculated completely ignoring group membership, is employed to 
statistically adjust the Y scores, this single equation can only 
reasonably be used if the equations for the different groups or 
cells are reasonably comparable, otherwise use of a "pooled" 
regression equation wo Id be inappropriate. 

Many researchers use statistical significance testing to 
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evaluate Doth their preliminary methodological assumption 
hypotheses (e.g., the ANOVA homogeneity of variance assumption, the 
ANCOVA homogeneity of regression assumption) and their substantive 
hypotheses (e.g., the mean dependent variable score of the 
treatment group equals that of the control group). These 
researchers hope to nst reject the null hypotheses involving 
methodological assumptions (e.g., they want the dependent variable 
variances in the cells to all be equal), while they typically hope 
to reject their substantive hypotheses. But as Thompson (1991c, p. 
504) notes, this creates a dilemma, since 

the same large sample size that yields power against 
Type II error in testing the substantive hypotheses 
of interest in ANCOVA [or ANOVA or the ^-test] is 
also going to tend to yield statistically 
significant effects for the preliminary homogeneity 
of regression [or of variance] test. 
Some researchers attempt to escape this dilemma by presuming 
that i:;.air methods are robust to the violation of their 
assumptions. This does not generally appear to be the case with 
ro.spect to ANCOVA (Keppel & Zedeck, 1989). And the longstanding 
view that ANOVA was robust to the violation of the homogeneity of 
variance assumption has recently been called into some question, 
thanks to more sophisticated Monte Carlo studies conducted with 
more complicated designs, and with more simulation samples (e.g., 
Rogan & Keselman, 1977; Tomarkin & Serlin, 1986; Wilcox, Charlin & 
Thompson, 1986) . 
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Alternatives 

Over the years various alternatives that might serve as 
substitutes for or augmentations of statistical significance 
testing have been proposed. For example, Serlin and Lapsley (1985) 
advocated placing an emphasis on confidence intervals, Bayesian 
approaches have been encouraged by some (e.g., Good, 1981), and 
somewhat less serious proposals have been presented by still others 
(Salzman, 1989) . 

But some strategies emphasize interpretation based on the 
estimated likelihood that results will replicate. This emphasis is 
compatible with the basic purpose of science: isolating conclusions 
that replicate under stated conditions. Notwithstanding some 
misconceptions to the contrary, conventional statistical 
significance tests do not evaluate the probability that results 
will generalize (Carver, 1978) . 

A particularly powerful strategy for evaluating result 
replicability invokes the bootstrap methods developed by Efron and 
his colleagues (cf. Diaconis & Efron, 1983; Efron, 1979; Lunneborg, 
1990). conceptually, these methods involve copying the data set 

over again and again many many times into an infinitely large 
"mega" data set. Then hundreds or thousands of different samples 

are drawn from the "mega" file, and results are computed separately 

for each sample and then averaged. 

The method is powerful because the analysis considers so many 

configurations of subjects (including configurations in which a 

subject may be represented several times or not at all) and informs 
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tha researcher regaraing the extent to which results generalize 
across different types of subjects. Lunneborg (1987) has offered 
some excellent computer programs that automate this logic for 
univariate applications; Thompson (1988) provides similar software 
for multivariate applications. Recently, user-friendly PC bootstrap 
softw*>re has become available from publishers around the world, 
e.g., the menu-driven program, BOJA, distributed by iecProGAMMA, 
P.O. Box 841, 9700 AV Groningen, The Netherlands.' 

All statistical tests invoke four estimates. The first is a 
single statistic estimating a single population parameter 
calculated from the sample data in hand. The remaining three 
estimates are calculated no£ from the data in hand, but rather from 
entirely different data U-e., the sampling distribution of the 
estimated parameter) conceptually involving multiple repeated 
samplings of the parameter estimate from a population. These four 
estimates are: (a) the single parameter estimate (e.g., x, £) 
derived from a sample believed to be representative of a 
population; (b) the second moment about the mean of multiple 
estimates of the parameter of interest (i.e., the standard 
deviation (SD) of the repeatedly sampled estimates-the standard 
error (SE^) of the estimated statistic) ; (c) the third moment about 
the mean of multiple estimates of the pa ameter (i.e., the 
coefficient of skewnessg) ; and (d) the fourth moment about the mean 
of multiple estimates of the parameter (i.e., the coefficient of 
kurtosisg) . 
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Many researchers recogniz ^ the use of the first two statistics 
in their analyses. Thus, researchers using LISREL and EQS analyses 
routinely pay more attention to parameter estimates that are 
greater than the individual standard errors of given estimates. As 
Kerlinger (1986, chapter 12) explains in some detail, test 
statistics also invoke tne ratio of a parameter estimate to the 
SEg. For example, researchers often use a i-test to evaluate the 
null hypothesis that a mean equals zero. For a sample of size n, 
the SD of infinitely many samples of size n from a population in 
which the mean is zero (i.e., SE^j) would be approximately 
SDx/(n**.5). The test statistic, for this research situation is 
calculated as the ratio, X / (SDx/ (n** • 5) ) . 

The use of the third and fourth statistics is not so explicit. 
But when we evaluate the probability of our sample result, Ecalculated 
given an assumption that the null is true, we usually compare our 
result against the a (or the a/2) percentile cf the test statistic, 
and the skewness and the kurtosis of this sampling distribution are 
part of what dictates what will be the value the a%ile of the test 
distribution. Of course, conventional confidence intervals employ 
exactly the same elements as statistical significance testing, and 
do make the use of all four estimates explicitly obvious (Glass & 
Hopkins, 1984, section 11.7). 

However, it is contradictory to be willing to use the sample 
to derive our (a) parameter estimate, and to be unwilling to let 
the sample offer similar insight regarding the (b) SE of our 
estimate, and regarding the (c) skewness and (d) kurtosis of 
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sampled estimates. One way to let our data speak regarding the 
latter three estimates is to conduct a bootstrap analysis, i.e., we 
momentarily treat our sample data as if it constituted the 
population and we draw numerous (usually at least a thousand) 
random samples from the sample to infer what the sampling 
distribution looks like. To mimic randomly sampling our data with 
n subjects from the population, we do all our "resampling" from our 
mock population by drawing random samples with replacement from our 
data in hand, and to honor our research situation each resample is 
drawn to also have exactly size n« 

The bootstrap approach can also be employed to yield a variety 
of confidence intervals, which vary as a function oi: the 
assumptions they make about the sampling distribution. Of course, 
bootstrap and other methods that focus on the invariance or the 
genera lizability of results are no more magical than is classical 
statistical significance testing itself. No analytic methods can 
take us beyond the limits of our data . We use methods to let data 
speak in various ways, not to make data more than they can be. 
A Bootstrap Example for the Univariate Case 

The Table 1 data can be used to illustrate a bootstrap 
application and its potential benefits. These estimates were 
developed using the software available from Lunneborg (1987) , and 
were based on 3,000 samples with replacement. As reported in Table 
2, the standard deviation of the 1,000 estimates of £ was .173 — 
this is the empirical estimate of SE,, and is considerably smaller 
than the estimate of the SE (SE^^ = .354, SE, = .339) derived based 
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on assumptions.^ Figure i graphically presents the bootstrap 
results. The bootstrap results were also useful in alerting the 
researcher to the fact that the sampling distribution may nai be 
normal, e.g., the distribution may be negatively skewed. 

INSERT TABLES 1 AND 2 AND FIGURE 1 ABOUT HERE. 

The bootstrap approach can be employed to yield a variety of 
confidence intervals, which vary as a function of the assumptions 
they make about the sampling distribution. The three estimates 
calculated by the Lunneborg (1987) program for the Table 1 data are 
reported in Table 2. The "bias corrected" estimate makes the fewest 
assumptions regarding the sampling distribution (Lunneborg , 1987, 
p. 54), that is, relies most upon the empirical findings from 
resampling. 3ince none of the confidence intervals subsume zero, 
the bootstrap results employing an empirically estimated sampling 
distribution, unlike the conventional approach, yields a 
statistically significant result. 

Bootstrap Multivariate Meth ods 

Most of the previous bootstrap software applications have been 
iirpleinented in univariate statistical applications. However, it 
might be argued that such methods would be even more useful in the 
multivariate case, since in theory multivariate methods offer even 
more opportunities to capitalize on sampling error (e.g., Gorsuch, 
1983, p. 330). 

The major barrier to conducting a multivariate bootstrap 
involves the multidimensional character of the "space" in which the 
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«n«lysla is ooaduotcd. The bootstrap auat bo appllod such th«t oaoh 
of tha hundrada or ttaouaanda of raaampllng raaulta ara all looatad 
in a ooBBiou factor apaoa bafora tha aaan# SD, akawnaaa and kurtoaia 
ara oomputad. 

For example, in a factor analysis of population data, the 
first two principal components of IQ data might be "Verbal" and 
"Performance", and the eigenvalues of the two factors prior to 
rotation (Thompson, 1989d) might be 5.5 and 5.4, respectively. In 
various samples from this population the two components might 
emerge very much as the same constructs, but sampling error might 
introduce small variations in the ordering of the two factors 
within the analysis, with "Verbal" being the first factor in some 
solutions but the second factor in other samples. 

If the analyst computed mean structure (or pattern) 
coefficients for the first variable on the first component across 
all the repeated samplings, the mean would be a nonsensical mess 
representing an average of some apples, some oranges, and perhaps 
some kiwi. The sampled solutions must be rotated to best fit 
positions with a common target solution, prior to computing means 
and other statistics across the samples, so that the results are 
reasonable . 

The same considerations apply when one is considering 
resampling from sample data in a bootstrap analysis, as against a 
meta-analysis of independent samples from a population (e.g., 
Thompson, 1989b) . Several viable candidates for the target used to 
define to common factor space that links results across resamplings 
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can be identified. These include: 

(a) a matrix of zeroes, ones, and negative ones, defining a simple 
structure, delineated based on theory; 

(b) a structure or a weight matrix isolated in previous research; 
or 

(c) a structure or a weight matrix for the sample data in hand. 

Bootstrap in the Canonical Case 

The theoretical and the programming difficulties inherent in 
conducting bootstrap analyses with multivariate procedures have 
been overcome as regards factor analysis (Daniel, 1992; Lambert, 
Wildt & Durand, 1990; Thompson, 1988c) and discriminant 
analysis/one-way MANOVA (Lawson & Snyder, 1992). This work is 
noteworthy, since multivariate methods are often vitally important 
in social science research (Fish, 1988) . 

Thompson (1986, p. 9) notes that the reality about which most 
researchers wish to generalize is usually one "in which the 
researcher cares about multiple outcomes, in which most outcomes 
have multiple causes, and in which most causes have multiple 
effects." Tatsuoka's (1973, p. 273) previous remarks remain 
telling: 

The often-heard argument, "I'm more interested in 
seeing how each variable, in its own right, affects 
the outcome" overlooks the fact that any variable 
taken in isolation may affect the criterion 
differently from the way it will act in the company 
of other variables. It also overlooks the fact that 
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multivariate anal «5is — precisely by considering all 
the variables simultaneously — can throw light on how 
each one contributes to the relation. 
Fish (1988) and Maxwell (in press) both present data illustrating 
how univariate and multivariate analysis of ^-he same data can lead 
to radically different conclusions. 

Although the availability of bootstrap software for factor 
analysis and for discriminant analysis/one-way MANOVA is helpful, 
it would also be useful to be able to bootstrap a canonical 
correlation analysis. Canonical correlation analysis is the most 
general case of classical general linear model analyses, subsuming 
other univariate and multivariate parametric methods (e.g., t- 
tests, ANOVA, ANCOVA, r, regression, MANOVA, and discriminant 
analysis) as special cases (Knapp, 1978; Xitao, 1992). Thompson 
(1988a, 1991b) illustrates these connections using small heuristic 
data sets to make the discussion concrete and accessible. 

The present paper uses data from Holzinger and Swineford 
(1939, pp. 81-91) for heuristic purposes to illustrate a bootstrap 
of a canonical analysis. These cognitive ability data are widely 
available, and have been employed by many authors for similar 
illustrative purposes (e.g., Gorsuch, 1983, passim; JQreskog & 
Sorbom, 1989, passim). 

The heuristic example assumes two criterion variables. General 
Verbal Ability and Paragraph Comprehension scores, and four 
predictor variables: speedeC Dot Counting, Speeded Discrimination 
of Straight and Curved Capitols, Math Number Series, and Woody- 
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McCall Mixed Math Fundamentals scores. Table 3 presents the 
correlation matrix associated with thv* full data set (H=301) . 

INSERT TABLE 3 ABOUT HERE. 

Canonical analysis partitions the correlation matrix into two 
"intradomain" and two "interdomain" quadrants. These four 
submatrices are then manipulated (see Thompson, 1984, pp. 11-16 for 
details) to yield a "quadruple-product matrix". The quadruple 
product matrix for these data is presented in Table 4. 

INSERT TABLE 4 ABOUT HERE. 

The quadruple-product matrix is then subjected to a principal 
components analysis. The eigenvalues of the quadruple-product are 
the squared canonical correlation coefficients (Ec^) . The number of 
squared canonical correlation coefficients always equals the number 
of variables in the smaller of the two vat: ible sets, because that 
is the rank of the "quadruple-product" matrix. 

Since conventional parametric methods are all correlational 
least squares analyses, all such analyses involve weights similar 
to the beta weights generated in regression. These weights are all 
analogous, but are given different names in different analyses 
(e.g., beta weights in regression, pattern coefficients in factor 
analysis, discriminant function coefficients in discriminant 
analysis, and canonical function coefficients in canonical 
correlation analysis) , mainly to obfuscate the commonalities of 
parametric methods, and to confuse graduate students. 
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All parametric methods also involve the creation of latent or 
synthetic variables analogous to the predicted dependent variable 
in regression (Y) . And all analyses can invoke the correlation 
coefficients between an observed and a latent variable (called a 
"structure correlation" or a "structure coefficient") an important 
aids to interpretation (Thompson & Borrello, 1985) . Table 5 
presents the canonical function, structure, correlation, and other 
coefficients associated »,xth the canonical analysis of the Table 3 
matrix. 

INSERT TABLE 5 ABOUT HERE. 

Table 6 presents the correlation coefficients for the same 
variables for a random sample (see Appendix A) of 50 of the 301 
subjects in the population. Table 7 presents the canonical 
analysis of these sample data. 

INSERT TABLES 6 AND 7 ABOUT HERE. 

Program CANSTRAP (Thompson, in press) was then employed to 
resample l,000 samples, each of size 50, from the Appendix A data. 
The resampling procedure in bootstrap typically invokes resaroples 
of the same size as the sample itself, to mimic the influences on 
the actual sample size. 

For this heuristic example the random resampling involved a 
mean use of the 50 subjects of 1,000 times each (SD=26.27). The 
smallest number of times a subject was drawn across 1,000 samples 
was 94 2. The most times a subject was drawn over 1,000 samples was 
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1,056. 

Of course, since resampling is done with replacement, a given 
subject may be drawn more than once in a given resampling, or not 
at all. For example, in these analyses subject 8 from Appendix A 
was drawn twice in the first resampling, but subject „ was not 
drawn at all in this resampling. However, in the second of the 
1,000 resamplings, subject 8 was not drawn at all, but subject 9 
was drawn three times. 

Table 8 presents descriptive statistics for the squared 
canonical correlation coefficients for both functions I and II 
across the 1,000 resamplings. Table 9 presents descriptive 
statistics for the function and structure coefficients for the 
smaller variable set, computed only after first invoking a 
Procrustean rotation of each resampled function coefficient matrix 
to a best fit position with the Table 7 function coefficient 
matrix.' Table lO presents the corresponding descriptive statistics 
for the function and structure coefficients for the larger variable 
set, across 1,000 resamplings. 

INSERT TABLES 8, 9 AND 10 ABOUT HERE. 

Discussion 

With respect to bootstrap canonical effect sizes, the mean Bc^ 
for function I across 1 000 resamplings was 23.703% (SD=. 11744) , as 
against the true population value of 29.982%, and the initial 
sample value of 35.574%. The standard deviation (.11744) about 
this mean estimate is an empirical estimate of the standard error 
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of the statistic. And the remaining moments about the mean advise 
the researcher that the resampled estimates are not normally 
distributed, as might be otherwise expected. Indeed, both sets of 
estimates are positively skewed, as reported in Table 8. 

The finding that the canonical correlation coefficients are 
somewhat positively biased is fully expected, just as "shrinkage- 
dynamics are expected in regression effects, indeed, it may be 
useful to invoke the same "shrinkage" corrections employed in 
regression (Fisk, 1991) with the resampled canonical estimates 
(Thompson, 1990) . 

In any case, the standard deviations of the resampled 
canonical correlations, akin to standard errors, should be 
carefully considered. For example, in the present study the mean Ec' 
for function I across 1,000 resamplings (43.958%) was within one SE 
(43.958% - 11.744% = 32.214%) of the actual sample result, i.e., 
35.374%. And the resampled result (43.958%) was within two SEs of 
the actual population (29.982%) result in the example analog to a 
true population. 

The standard errors for the function and structure 
coefficients, presented in Tables 9 and 10, indicate that both 
function and structure coefficients are highly susceptible to 
sampling error. Again, this result is consistent with previous 
Monte carlo research (Thompson, I99la) . such results alert the 
researcher to exercise considerable caution when interpreting 
canonical weights and structure coefficients. 

In summan, the business of science is formulating 
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9«n«rali8«Dl« insight. No one study, taken singly, establishes the 
basis for such insight. As Neale and Liebert (1986, p. 290) 
observe : 

No one study, however shrewdly designed and 
carefully executed, can provide convincing support 
for a causal hypothesis or theoretical statement... 
Too many possible (if not plausible) confounds, 
limitations on generality, and alternative 
interpretations can be offered for any one 
observation. Moreover, each of the basic methods of 
research (experimental, correlational, and case 
study) and techniques of comparison (within- or 
between-subjects) has intrinsic limitations. How, 
then, does social science theory advance through 
research? The answer is, by collecting a diverse 
body of evidence about any major theoretical 
proposition. 

Evaluating the generalizability of canonical results is a 
daunting task, but a task which the serious scholar can ill-afford 
to shirk. Such evaluations are important. As Nunnally (1978, p. 
298) notes, "one tends to take advantage of chance in any situation 
[all parametric methods] where something is optimized from the data 
at hand", as in least squares methods, i.e., all c ventional 
parametric methods. 

Bootstrap analyses are one vehicle, but an important vehicle, 
for evaluating the replicability of results. The researcher may 
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vest more confidence in results that replicate over the numerous 
configurations of subjects created during a bootstrap analysis. 
Since such analyses capitalize during resampling on the 
commonalities inherent in a given sample in hand (e.g., measurement 
at a given point in time, perhaps measurement in a given geographic 
location) , such analyses always yield somewhat inflated evaluations 
of replicability. But inflated empirical evaluations of 
replicability are often superior to a mere presumption of 
replicability, especially when the researcher can take this 
capitalization into account during interpretation. 
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Footnotes 

'Examples of such software and the distributors of the software 
include: (a) "Resampling Stats", distributed by Resampling Stats, 
612 N. Jackson, Arlington, VA 22201; (b) "Statistical Calculator", 
distributed by Erlbaum, 27 Palmeira Mansions, Church Road, Hove 
East Sussex BN3 2FA, United Kingdom; (c) SPIDA, distributed on 
behalf of its Australian author by SERC, 1107 NE 45th — Suite 520, 
Seattle, WA 98105; and (d) the menu-driven program, BOJA, 
distributed by iecProGAMMA, P.O. Box 841, 9700 AV Groningen, The 
Netherlands. 

^Strictly speaking, the standard error (SE) of Z, is only l/((n- 
3)**. 5) when the population c is zero. Thus, it is actually 
contradictory to calculate SE^, based on an assumption that c = 0, 
and to then use SE^, to calculate confidence intervals for £ ^ 0 
unless one only wishes to test Hq-. £ = 0. In this case conceptually 
the CI is really being constructed around 0 (and not e) , and the 
test is whether the point estimate, £, falls within the interval. 
However, in practice we usually consider this estimation procedure 
to be "close enough". 

^Another viable candidate for the target matrix used to define 
a common factor space would be the eigenvector matrix of the 
quadruple product matrix. 
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Table 1 

Hypothetical Data Used to Illustrate Bootstrap 
Evaluation of an Estimate of £ 



ID 


y 


X 


1 


. 18 


.20 


2 


. 54 


1. 88 




- . 49 


-. 76 


4 


.92 


.42 


5 


.22 


.32 


6 


.75 


-.56 


7 


.66 


1.55 


8 


-2.65 


-1.21 


9 


-.51 


-.66 


10 


.47 


-.96 


11 


-.09 


-.21 


^YX 


• 


560 




• 


632 



HOtfi. Z, = 1.1513 (In ((1 4- |rj) / (1 - !r!))) 

1.1513 (In ((1 + .560) / (1 - .560))) 

1.1513 (In ( 1.560 / .440) ) 

1.1513 (In ( 3.541)) 

1.1513 (.549) = .632 
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Table 2 

Conventional and Bootstrap significance Tests 
for i:=.560 for the Table 1 Data 



Sampling Statistics/ 
Significance Tests 



Classical Estimates Based 
on Statistical Assumptions 



Empirically Based 
Bootstrap Estimates 



Second Moment of the Sampling 
Distribution 

SE, 

Third Moment of the Sampling 
Distribution 

Coefficient of Skewness of e 

Third Moment of the Sampling 
Distribution 

Coefficient of Kurtosis of £ 

Density of the Sampling Distribution 
90.0%ile of Z, 
95.0%ile of 
97.5%ile of 

95% Confidence intervals 
About Zf 
About r 



.354' 
.339'' 



000 (assumed) 



000 (assumed) 



1.282 (assumed) 
1.645 (assumed) 
1.960 (assumed) 



-.061 to 1.325' 
-.060 to 0.868** 



. 173 



-.780 



1.895 



1. 037 
1. 164 
1.324 



+.220 to +.899" 
+.188 to +.868^ 
+.082 to +.822* 



'Calculated as SE 



Zr 



3) 



.5) = 



^'Calculated 
"calculated 



as 

as 



SE 
CI 



Zr 
95* 



= 1 / ((n - 3) ** .5) = 1 / ((11 - 

1 / (8 ** .5) = 1 / 2.828 = .354. 
= .354 converted back into SE^ = .339. 
about Z, = - (1.960 * SE^^ to Z, + (1.960 * SE^,) 
= .632 - (1.960 * .354) to .632 + 
= .632 - .693 to .632 + .693 
*^he conversion of £ expressed as Fisher's Z transform back 
'Cl^i% calculated using symmetric or normal theory approach. 
CI95, calculated using perce c lie method. 
'CI,,, calculated using bias corrected method. 



(1.960 * .354) 



itO £. 
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Table 3 

Correlation Coefficients for "Population" of li=301 
Subjects from the Holzinger and Swineford (1939) study 



General verbal 
Earaoraoh ComorehanBion 


T5 
T6 


T5 

1.0000 
.6572 


T6 

.65721 
1.0000! 


T12 
.1649 
.1069 


T13 
.2052 
.2078 


T23 
.3950 


T24 
.3933 
.4353 


Speeded Dot Counting 
Speeded Discrimination Caps 
Math Number Series 
Woody-McCall Math 


T12 
T13 
T23 
T24 


. 1649 
.2052 
.3950 
.3933 


.1069i 
.20781 
.45161 
.43531 


1.0000 
.4490 
.2615 
.3111 


.4490 
1.0000 
.3322 
.2824 


.2615 
.3322 
1.0000 
.4600 


.3111 
.2824 
.4600 
1.0000 



Table 4 

"Quadruple-Product" Matrix Analyzed in Canonical Analysis 

(N=301) 

1 2 

1 .170 .093 

2 .160 .273 



Table 5 

Canonical Function and structure Coefficients for Population Data 

(N=3 0l) 



Function I 



Variable/ 

Coef. Function Structure 

T5 0.36902 0.84089 

T6 0.71803 0.96054 
Adequacy 
Redundancy 
Rc2 

Redundancy 
Adequacy 

T12 -0.13193 0.25128 

T13 0.11157 0.41088 

T23 0.59236 0.85842 

T24 0.57288 0.83581 



Squared 
Structure 
70.710* 
92.264% 
81.487% 
24.431% 
29.982% 
12.498% 
41.686% 
6.314% 
16.882% 
73.688% 
69.858% 



Function II 



Squared 

Function Structure Structure 
-1.27437 -0.54120 29.290% 
1.11563 0.27814 7.736% 

18.513% 
0. 165% 
0.893% 
0.234% 
26.252% 
92.421% 
9.865% 
0.002% 
2.720% 



■1.06970 -0.96136 
0.07567 -0.31409 
0.24321 0.00453 
0.03461 -0.16491 



h 

100.00% 
100.00% 



98.74% 
26.75% 
73.69% 
72.58% 



Table 6 

Correlation Coefficients for "Population" of n=50 
Subjects from the Holzinger and Swineford (1939) study 



General Verbal 
Paragraph ComprehenBion 

Speeded Dot Counting .__ 

Speeded Diacrintlnation Caps T13 
Math Number Serlea t23 
Woody-McCall Math T24 



T5 T6 
T5 1.0000 .6440 
T6 .6440 1.0000 
T12 -.0399 .0703 



.0762 
.3892 
.4297 



.2587 
.5461 
.4064 



T12 T13 T23 T24 
-.0399 .0762 .3892 .4297 
.0703 .2587 .5461 .4064 
trPQQQ .3847 .25 12 .3705 



.3847 1.0000 .3963 .2896 
.2512 .3963 1.0000 .5426 
.3705 .2896 .5426 1.0000 



Table 7 

Canonical Function and Structure Coefficients for Sample Data 

(n=50) 



Function I 



Variable/ 

Coef. Function Structure 

T5 0.36603 0.83247 

T6 0.72426 0.95999 

Adequacy 
Redundancy 
Rc2 

Redundancy 
Adequacy 

T12 -0.31827 0.06100 

T13 0.06941 0.36195 

T23 0.69730 0.90461 

T24 0.47876 0.75926 



Squared 
Structure 
69.301% 
92.158% 
80.729% 
28.557% 
35.374% 
13.526% 
38.238% 
0.372% 
13.101% 
81.832% 
57.648% 



Function 



Function 
-1.25488 
1.08819 



0^.43648 
.50749 
0.55004 
-0.93233 





Squared 




2 


Structure 


Structure 




h 


-0.55407 


30.699% 


100 


.00% 


0.28002 


7.841% 


100 


.00% 




19.270% 








1.714% 








8.893% 








1.780% 








20.010% 






0.42443 


18.014% 


18 


.39% 


0.62334 


38.855% 


51 


.96% 


0.35494 


12.598% 


94 


.43% 


-0.32518 


10.574% 


68 


.22% 
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Table 8 

Descriptive Statistics for Bc^ Across 1,000 Resamplings 



Statistic 
Mean 
SO 

Skewness 
Kurtosis 



Function 
II 



0.43958 
0.11744 
0.23703 
-0.09252 



0.13158 
0.07873 
0.85067 
0.60673 



M2te. Program CANSTRAP algorithm 1, computation of Bc^ Independent 
Procrustean rotation of the resampled canonical function matrices, 
selected for this anali \s. 



Table 9 

Descriptive Statistics for Function and Structure Coefficients 
for the Smaller Variable set Across 1,000 Resamplings 



Function Coefficients 



*** MEANS 






1 0.4698 


-1. 


1654 


2 0.3082 


1. 


0624 


*** SDs 






1 0.3643 


0. 


3221 


2 0.7101 


0. 


2740 


*** SKEWNESSs 




1 1.0118 


3. 


0347 


2 -1.2058 


-0. 


4001 


*** KURTOSISs 




1 0.5090 


12. 


8658 


2 0.0569 


0. 


3770 



Structure Coefficients 



*** MEANS 






1 0.6552 


-0. 


4587 


2 0.6218 


0. 


3074 


*** SDs 






1 0.4766 


0. 


3654 


2 0.6555 


0. 


2994 


*** SKEWNESSs 




1 -1.9942 


1. 


5918 


2 -1.8525 


0. 


1646 


*** KURTOSISs 




1 2.4850 


2. 


8181 


2 1.7368 


0. 


1547 




Table 10 

Descriptive Statistics for Function and structure Coefficients 
for the Larger Variable Set Across 1,000 Resamplings 



Function Coefficients 



*** 


MEANS 






1 


-0 


.2394 


0. 


3141 


2 


-0 


.0147 


0. 


3504 


3 


0 


.3953 


0. 


4304 


4 


0 


.4196 


-0. 


6641 


*** 


SDs 






1 


0 


.3332 


0. 


4803 


2 


0 


.2755 


0. 


3996 


3 


0 


.5256 


0. 


3948 


4 


0 


.3677 


0. 


4173 


*** 


SKEWNESSs 


1 


0 


.1812 


-0. 


3334 


2 


0 


. 1025 


-0. 


6003 


3 


-1 


.2935 


-0. 


7817 


4 


-0 


. 6616 


1. 


1365 


*** 


KURTOSISs 


1 


-0 


.4745 


-0. 


3368 


2 


-0 


.0576 


0. 


2432 


3 


0 


.7095 


0. 


9570 


4 


0 


.5870 


1. 


4635 



Structure Coefficients 
*** MEANS 

1 0.0146 0.2913 

2 0.1574 0.4520 

3 0.5515 0.3099 

4 0.5407 -0.2115 
*** SDs 

1 0.2872 0.4174 

2 0.3872 0.2901 

3 0.6003 0.2962 

4 0.4675 0.3532 
*** SKEWNESSs 

1 -0.1697 -0.5314 

2 -0.6357 -0.7588 

3 -1.7525 -0.0483 

4 -1.9404 0.4557 
*** KURTOSISs 

1 -0.1597 -0.2460 

2 -0.3010 0.6731 

3 1.5082 -0.1744 

4 2.4797 -0.2703 
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Figure 1 

Bootstrap Estimates of £ Based on 1,000 Random Resamplings 
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Mote. Each asterisk represents approximately eight cases. The distribution of 1,000 bootstrap 
estimates of £ is presented to the left, while the distribution of the Fisher's Z transformation 
of these 1,000 estimates is presented to the right. The normal distribution of samples of Z,, 
expected given the classical statistical assumptions that sampling error is distributed normally 
about the estimate, is also presented in the histogram on the right. 
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APPENDIX A: 
Random Sample of n'^SO Cases from ^"301 
(Holzinger & Swineford, 1939, pp. 81-91) 
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